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Abstract:  The  design  of  the  Alphard  programming  language  has  been  strongly  influenced  by 
ideas  from  the  areas  of  programming  methodology  and  formal  program  verification.  In  this 
paper  we  design,  implement,  and  verify  a general  symbol  table  mechanism.  This  example  is 
rich  enough  to  allow  us  to  illustrate  the  use  as  well  as  the  definition  of  programmer -defined 
abstractions.  The  verification  illustrates  the  power  of  the  form  to  simplify  proofs  by  providing 
strong  specifications  of  such  abstractions. 
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Introduction— The  Symbol  Table  Task 


Previous  reports  [Shaw76b,  Wulf76a,b]  have  described  the  Alphard  programming 
language  and  its  associated  verification  methodology.  These  reports  developed  Alphard 
definitions  for  the  canonical  examples  of  data  abstractions  (stacks,  queues,  and  sets).  These 
examples  are  sufficiently  simple  to  be  grasped  readily,  and  they  have  appeared  often  enough 
in  other  languages  that  the  reader  may  compare  various  approaches  to  their  definition.  There 
is,  however,  a danger  in  considering  only  these  examples.  It  is  possible  that  an  approach  will 
work  for  only  the  easy  examples,  or  that  the  definition  of  something  more  complex  will  be  far 
less  elegant. 

Therefore,  in  this  report  we  shall  consider  a larger,  more  realistic  example:  an 
abstraction  of  a symbol  table.  For  comparison  purposes  the  reader  may  wish  to  refer  to  the 
similar  example  given  in  [Guttag76]  and  to  a hashtable  example  in  [Wegbreit76]. 

Suppose  that  we  must  produce  a number  of  compilers,  assemblers,  and  interpreters  to 
operate  on  several  different  computers.  Each  such  system  will  contain  a symbol  table 
mechanism)  although  each  system  will  have  its  own  requirements,  many  of  the  gross,  abstract 
properties  of  these  symbol  tables  will  be  the  same.  It  seems  desirable  to  have  a single 
implementation  of  these  common  aspects  which  is  verified-,  that  will  be  our  aim. 


But  what  ora  the  common  properties?  Many  texts  [e.g.,  Gries71]  describe  a symbol 
table  as  a mapping  from  identifiers  (strings  appearing  in  a source  program)  to  a set  of 
attributes  associated  with  those  identifiers.  Examples  of  such  attributes  include  "type”,  "run- 
time memory  address",  "number  of  dimensions"  (for  arrays),  etc.  In  some  cases,  the  mapping 
may  be  sensitive  to  the  context  in  which  the  identifier  occurs.  (Algol-like  block  structure  is 
the  most  common  example  of  this  context  sensitivity;  the  mapping  from  identifier  to  attributes 
depends  upon  the  block  in  which  the  identifier  appears.  Name  qualification,  as  in  field 
selection  from  a record,  is  another  example  in  which  the  interpretation  of  the  field  selector 
depends  upon  the  type  of  the  record.)  The  common  properties,  then,  are  ones  which  involve 
the  application  and  manipulation  of  this  mapping;  principally 
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- some  means  to  apply  the  mapping,  i.e.,  to  find  the  attributes  associated  with  the 

occurrence  of  an  identifier. 

- some  means  to  alter  the  mapping,  e.g.,  by  inserting  and/or  deleting  entries  and 

signaling  changes  in  context. 
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Since  we  want  our  abstraction  to  serve  a spectrum  of  languages,  system  types  (e.g., 
compilers  and  assemblers),  and  machines,  it  would  not  be  appropriate  to  include  the  specific 
attributes  as  part  of  the  abstraction.  Rather,  we  shall  presume  that  the  user  of  our 
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abstraction  will  define  some  mechanism  for  storing  and  retrieving  attributes,  e.g.,  a vector  of 
records;  our  abstraction  will  then  provide  a mapping  from  an  identifier  to  a unique  integer 
which,  for  example,  may  then  be  used  as  an  index  into  this  vector  of  attribute  records. 

Concerning  the  issue  of  context  sensitivity,  we  shall  provide  an  abstraction  which 
supports  block  structure  because  (1)  it  is  the  more  general  case  and  (2)  with  proper 
implementation,  the  generality  costs  very  little  when  it  is  not  used.  We  shall  not  explicitly 
provido  for  the  kind  of  context  sensitivity  needed  tor  record  selectors,  but  we  shall  show  how 
the  abstraction  may  be  used  to  achieve  it. 

Note  that  the  informal  term  "block-structured"  does  not  describe  a unique  name-binding 
policy.  For  example,  consider  the  program  fragment 


integer  K— 10; 
begin 

vector  X[l:k]> 
integer  k-3; 


In  the  declaration  of  the  vector  "X",  there  is  a question  about  which  "k"  should  be  used  to 
define  its  upper  bound.  The  semantics  of  some  languages  specify  that  the  value  of  the 
variable  "k"  defined  at  the  outer  block  level,  i.e.,  10,  should  be  used;  other  languages  specify 
that  it  is  the  innermost  definition,  i.e.,  "integer  k»3“,  which  should  be  used.  To  accommodate 
the  second  of  these  schemes  requires  that  a full  lexical  analysis  pass  be  performed  before 
any  name  binding  (symbol  table  construction)  is  done. 

In  order  to  make  our  abstraction  useful  on  this  pure  lexical  pass,  as  well  as  later  when 
the  full  symbol  table  is  constructed,  we  shall  define  it  as  a mapping  between  "things"  and 
integero.  In  a simple  system  the  "things"  will  be  identifiers  and  the  integers  will  probably  be 
indices  into  the  vector  of  attributes  described  above.  In  a more  complex  system,  the  initial 
lexical  pass  may  use  the  abstraction  to  convert  identifiers  into  integers;  these  integers  may  In 
turn  be  the  "things"  mapped  into  symbol  table  indices  during  a later  pass.  An  example  of  the 
use  of  the  abstraction  will  be  given  laler  to  help  clarify  this  point;  for  the  moment  the  reader 
may  simply  assume  that  the  "things"  are  identifiers. 

Summarizing,  then,  our  abstraction  shall  provide: 

(a)  A block-structured  mapping  from  "things"  to  integers. 

(b)  A set  of  six  operations  to  intart  a new  "thing",  to  lookup  the  integer 

associetad  with  a specific  "thing",  to  test  whether  a specific  "thing"  is 
defined  at  tha  currant  block  leval,  to  enter  and  to  leave  a block  level,  and  to 
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tost  whether  the  mapping  is  fall,  i.e.,  whether  there  is  room  for  another 
"thing". 


The  Symbol  Table  Abstraction 


The  preceding  section  provides  an  informal  description  of  the  symbol  table  abstraction; 
in  this  section  we  shall  be  more  precise.  Specifically,  the  specifications  part  of  the  form  called 
"symtab”  is: ^ 

form  r,ymtab(T:form<  ♦-,-,hash(T,k:integer)  returns  x:integer  pre  (k>0)  post  (OSx<k’)  >, 
m,n:integer)  - 
beeinform 
specifications 

requires  nil  a mil; 

let  symtab  » <block:integer,  assoc:{<s:T,W:integer,ui:integer>}>; 
invariant 

cardina!ity(assoc)£n 
a lSui<n  a lsbl<block 

a (♦  i »t 2 ( assoc  3 (tj.s-t2-s  a tf-bl^bl  * t j .ui>*t2-tii)>j 
initially  symtab  - <1,{}>; 
functions 

defined(st:symtab,str:T)  returns  t:boolean 
post  t - 3i  st  <str,st.block’,i>  ( st. assoc, 
insert(st:symtab,str:T)  returns  i:integer 

ore  cardinalitytst. assoc)  < n a •’defined(st.str) 
post  st  - <st. block’,  st.assoc*  u {<str,st.block’,i>}>, 
lookup(st:syintab,str.T)  returns  xiinteger 

post  if  3 y ( st.assoc  rt  [y.s-str  a Vz  < st.assoc,  z.s-str  o z.bl  £ y.bl] 
then  x - y.ui 
else  x - 0, 

enterblock(st:symtab) 

post  st  - <st. block’*  1 .st.assoc ’>, 
leaveblock(st:symtab) 
ore  st.block  > 1 

post  st  - <st.block’-l,  st.assoc’  - (<s,x,ui>  it  xist.block’}>, 


A primed  variable  (e.g.,  k’)  represents  the  value  of  that  variable  prior  to  the 
execution  of  an  operation.  To  shorten  the  pre.  post,  in,  and  ojjt  conditions  in  our  papers,  we 
often,  by  convention,  omit  assertions  about  variables  which  are  completely  unchanged.  Thus 
for  example,  we  have  omitted  st  - st’  from  the  post  condition  of  defined. 
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full(st:symtab)  returns  tiboolean 

post  t - (cardinalitytst. assoc)  - n); 


Note  that,  abstractly,  a symbol  table  consists  of  a pair,  an  integer,  "block",  and  a set, 
"assoc".  The  integer  denotes  the  current  block  level,  has  the  initial  value  1,  and  is  altered 
only  by  the  operations  enterbLock  and  LeavebLock.  The  set,  initially  empty,  consists  of  triples 
containing  the  "thing"  defined,  the  block  level  at  which  it  was  defined,  and  the  unique  integer 
("ui")  associated  with  the  <thing, block  level>  pair 

The  parameters  of  the  form  specify  the  type,  "T"  (usually  strings),  of  "things"  to  be 
entered  in  the  table,  and  the  maximum  number,  "n",  of  simultaneous  entries  permitted.  The 
parameter  "m"  is  a bit  more  difficult  to  explain,  and  we  shall  for  a moment  defer  it,  together 
with  the  discussion  of  the  required  rights  of  T. 

Since  the  symbol  table  contains  only  currently  defined  things,  the  block  level  of  each 
entry  must  be  legitimate  (e.g.,  between  1 and  the  current  value  of  "block").  Further,  since  a 
maximum  of  n entries  is  allowed,  the  "associated  integer"  must  lie  between  1 and  n.  The  let 
clause  and  the  abstract  invariant  express  these  restrictions  (the  last  line  of  the  invariant 
expresses  the  uniqueness  of  the  inleger  associations).  The  remainder  of  the  specifications 
states  that  the  initial  symbol  table  has  a block  level  of  1 and  an  empty  "assoc"  set,  and  then 
lists  the  symbol  table  functions  and  their  abstract  pre  and  post  conditions. 

Now,  let  us  return  to  the  issue  of  the  parameter  m and  the  required  rights  on  T.  As 
may  be  seen  from  the  requires  clause  of  the  specifications,  the  only  requirement  on  m is  that 
its  value  be  strictly  positive;  it  does  not  enter  into  any  of  the  other  parts  of  the  formal 
specification.  Hence,  one  may  properly  conclude  that  its  precise  value  is  immaterial  and  the 
abstraction  will  function  correctly  with  any  positive  value. 

The  value  of  m does,  however,  affect  the  performance  of  the  abstraction.  Neither 
Alphard  nor  other  languages  with  similar  goals  have  yet  found  an  appropriate  way  to  specify 
performance  properties.  In  practical  systems,  of  course,  such  properties  are  of  paramount 
importance.  Since  we  now  have  no  formal  way  of  specifying  them,  we  must  give  a small  peek 
into  the  representation  in  order  to  explain  the  significance  of  m.  (Indeed,  the  need  to  have  m 
and  the  hash  function  name  in  the  specifications  has  essentially  revealed  the  techniques  used 
in  the  implementation  of  the  abstraction.)  The  representation  uses  a hash  table,  with  collisions 
resolved  by  chaining,  and  m specifies  the  size  of  this  table,  i.e.,  the  number  of  values  that  the 
hash  function  may  assume.  Although  any  positive  value  of  m will  work,  larger  values  will  tend 
to  provide  faster  searches  at  the  expense  of  some  additional  storage. 

In  addition,  the  value  of  m may  affect  the  distribution  of  "hits"  on  any  particular  hash 
table  entry;  see  [Knuth73]  for  a discussion  of  hashing  functions  and  their  properties.  We  will 
not  discuss  these  properties  here,  but  note  that  the  form  T which  defines  the  things  stored  in 
the  symbol  table  is  required  to  provide  a hashing  function  which,  given  an  object  of  type  T 
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and  an  integer  k,  returns  an  integer  in  the  range  0 to  k-1.  Thus,  an  appropriate  choice  of  m 
depends  in  part  on  the  properties  of  this  function. 


Implementation  of  Symbol  Table 


In  choosing  the  implementation  of  the  symbol  table  abstraction,  we  have  been  careful  to 
pick  a practical  one;  it  is,  in  fact,  one  which  is  used  in  several  commercial  compilers.  We  chose 
to  do  this  rather  than,  for  example,  to  use  a direct  implementation  in  terms  of  sets  (e.g.,  the 
simpLaset  form  defined  in  [Shaw76b]).  We  have  done  this  in  order  to  emphasize  that  both  the 
language  and  verification  methodology  are  intended  to  be  used  for  practical,  production  quality 
systems.  The  more  direct  implementation,  and  also  its  proof,  would  have  been  straightforward 
and  clear.  However,  it  would  not  have  been  a production  quality  implementation  and  thus 
would  not  have  been  useful  in  a real  system.  We  shall  comment  on  this  point  further  in  the 
conclusion,  but  we  feel  strongly  that  language,  methodology,  and  verification  must  respond  to 
the  requirements  of  practical,  efficient  systems. 

We  shall  obtain  the  implementation  in  two  steps.  We  shall  define  an  intermediate 
abstraction  (form)  in  the  process  of  obtaining  the  complete  implementation.  This  intermediate 
abstraction  will  support  a restricted,  but  not  uncommon,  style  of  list-processing. 

Now,  whenever  a system  implementation  is  described,  one  is  faced  with  a presentation 
problem:  whether  the  description  should  be  "top-down"  or  "bottom-up".  Both  have 

advantages.  In  this  case  we  have  chosen  to  make  the  presentation  predominantly  top-down  — 
primarily  to  emphasize  that  the  implementation  of  lower  level  abstractions  is  irrelevant  to  the 
correctness  of  the  higher  level  ones.  The  next  paragraph,  however,  is  an  exception  to  the 
predominant  flavor  of  the  presentation;  it  describes  the  implementation  of  the  symbol  table  in 
low-level  terms,  as  it  will  exist  after  compilation  of  the  forms.  It  is  included  for  those  of  us 
(including  the  authors)  who  still  need  concrete  representations  to  aid  their  reasoning;  purists 
may  simply  skip  the  next  paragraph. 

The  symbol  table  will  be  implemented  as  a hash  table  with  explicit  entries  for  the 
symbol  and  its  declaration  block  level,  but  an  implicit  encoding  of  the  integer  mapping.  Hash 
collisions  are  resolved  by  associating  a linked  list  of  symbol  table  entries  with  each  value  of 
the  hash  function.  Each  new  entry  is  inserted  at  the  head  of  the  appropriate  list.  The  entries 
on  the  lists  are  therefore  ordered  by  block  level  (innermost  block  first).  To  find  the  innermost 
instance  of  a symbol,  lookup  need  only  perform  a linear  search  of  the  list  associated  with  the 
hash  value  of  the  symbol;  the  first  instance  of  the  symbol  in  the  list  is  necessarily  the  one 
declared  at  the  innermost  block  level.  It  is  a simple  matter  for  Uwbloek  to  delete  the  proper 
entries  from  the  heads  of  these  lists. 

The  implementation  of  symtab  presumes  the  existence  of  a form  called  "condia" 
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(collection  of  named,  disjoint  integer  sequences).  The  explanation  of  the  symtab 
implementation  will  require  that  we  first  understand  (i.e.,  specify)  condis.  Although  condis  is 
intended  to  support  a group  of  linear  lists,  its  abstract  specification  is  stated  in  terms  of  more 
mathematically  tractable  entities,  namely  sets  and  sequences.^  The  verification  of  symtab  wilt 
use  the  abstract  specification  from  condis  but  nothing  else.  The  verification  of  condis  will  be 
independent  of  symtab  and  its  verification.  The  specifications  part  of  condis  is: 

form  condis(n,m:integer)  - 

beeinform 

specifications 

requires  nil  a mil; 

Ifii  condis  - l:{sqj:<ejj,ej2 ejfV>  | 0<i<m-l  a e^  u integer}; 

invariant  lSejk<n  a Vi,j  ( [O.  m-ljfe^'  -e-h  d i-j  a k|-k2); 

iniilllhl  Vi  < f0..m-l]  sq;  - <>;  1 

fusions 

xtnd(s:condis,i:integer)  returns  jrinteger 

ere  i < [0..m-l]  a SlGMAjtj0 length(s.sqj)<n, 

post  s.sqj  - <j>~s.sqj’,  ! note  j is  a new  value  not  in  any  sq  (by  1#) 

del(s:condis,  i,j:integer) 

erft  s.sqj  - < . . . , j,  ...  > a i([0..m-l) 
post  s.sqj  *•  <j,  ...  >, 
delal!($:condis,i:integer) 
pre  i ( [0..m-l] 
post  s.sqj  - <>, 

full(s:condis)  returns  t:boolean 

POSt  t - SIGMAj([0  m_j j length(s.sqj)  - n; 
generator  indis(s:condis,i:integer)  extends  x:integer 
requires  OsiSm-1 
let  indis  - s.sqj  where  indisr«>  o 

(indis  - c~<x>~d  jmjl  c,  <x>,  and  d are  disjoint); 

Ulift  lord,  x,  <s,i>,  ST)  - 

premise  s.sqj  "C~<x>~d  a 1(c)  {ST}  I(c~<x>); 
ryfft  first(P,  x,  <s,i>,  fi,  S^  S2.  0)  - 

Premise  s.sqj-c~<x>~d  a P a Vy  < c(-/3(y))  a fl{x)  {Sj}  Q, 

Biamiaft  P A Vy  t s.sqj-^(y)  {S2}  Q; 

auxiliary  predicates 

follows(s:condis,i,j:integer)  3k  at  sqK  - < . . .,  i j, . . . >, 

mbr(s:condis,i,j:integer)  adf  -<...,  j, ...  >; 


A condis  is  abstractly  described  as  a set  of  precisely  m sequences  of  Integers;  these 


Definitions  end  properties  of  sets  appear  in  [Halmos60]  and  those  of  sequences  in 
[Wutf76a,b). 
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sequences  are  named  sqQ  through  sqm_j.  The  abstract  invariant  asserts  that:  (1)  each  integer 
in  any  of  the  sequences  lies  in  the  range  1 to  n and  (2)  a particular  integer  appears  as  a 
sequence  element  at  most  once  in  the  entire  set  of  sequences.  From  these  two  facts  we  can 
observe  that  the  sum  of  the  lengths  of  the  sequences  is  at  most  n;  moreover,  in  the  case  that 
this  sum  is  n,  each  of  the  integers  1 through  n will  appear  (precisely  once)  in  one  of  the 
sequences. 

As  a practical  matter,  each  of  the  sequences  in  the  condis  will  represent  a linear  list; 
specifically,  sqj  will  be  associated  with  the  value  i produced  by  the  hash  function.  The 
sequence  elements  will  be  the  (integer)  indices  into  a vector  of  information  within  symtab;  thus 
the  sequence  sqj  (and  the  corresponding  entries  in  the  vector  of  information)  will  represent 
the  linear  list  of  triples  in  the  abstract  "assoc"  set  of  symtab  which  have  the  hash  function 
value  i. 

Four  functions  and  a generator  are  provided  by  the  condis  form.  Function  xtnd  extends 
the  head  of  a specified  sequence  by  one  element;  the  abstract  invariant  prevents  this  integer 
from  being  one  which  already  appears  in  some  sequence.  Function  del  permits  the  initial 
elements  of  a specified  sequence  to  be  deleted,  and  function  delall  permits  all  the  elements  of 
a specified  sequence  to  be  deleted.  Function  full  tests  whether  all  of  the  integers  already  are 
in  some  sequence.  Generator  Indi-s  produces  the  elements  of  a specified  sequence  in  order, 
starting  with  the  head.  The  specification  of  condis  also  gives  two  auxiliary  predicates  (follows 
and  rribr).  These  may  be  used  in  proofs,  but  are  not  actually  implemented  as  executable 
functions;  they  should  be  viewed  as  an  extension  to  the  abstract  vocabulary. 

At  first  sight,  the  condis  abstraction  may  seem  unusual;  however,  we  chose  to  define  it 
in  this  way  for  two  reasons: 

- By  using  integers  to  denote  elements,  we  can  obtain  an  efficient  encoding  of  the 

unique  integer  mapping  required  by  symtab.  This  encoding  is  one  which 
might  be  selected  in  actual  practice. 

- This  definition  allows  us  to  shirt  the  issue  of  pointers  (references)  for  purposes 

of  this  paper. ^ 


Now  we  can  present  the  complete  definition  of  the  symtab  form. 


^ As  most  people  who  have  followed  the  recent  literature  on  programming  methodology 
and  verification  are  aware,  the  presence  of  references  (unconstrained  pointers)  in  a 
programming  language  interferes  with  our  ability  to  understand  and  verify  programs  that  use 
them.  While  we  believe  we  have  made  significant  progress  in  Alphard  toward  resolving  the 
problems  introduced  by  the  unconstrained  pointer,  we  will  not  complicate  this  paper  with 
pointer  issues. 
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{firm  symtab<T:form<  *-,-,hash(T,k:integer)  returns  x:integer  pro  (k>0)  post  (0Sx<*’)  >, 
m,n:integer)  - 
beeintorm 
specifications 

requires  nil  a mil; 

!fii  symtab  - <block:integer,  assoc:{<s:T,bl:integer,ui:integer>}>; 
invariant 

cardinality(assoc)<n 
a lsui<n  a l<bl<block 

a (t^  < assoc  =>  (t1.s-t2.s  a tj.bl-t2.bl  ■ tj.ui-t2.ui)); 
initially  svmtab  - <],{}>: 
functions 

defined(st:symtab,str:T)  returns  tiboolean 
Post  t - 3i  st  <str,st.block’,i>  < st. assoc, 
insert(st:symtab,str:T)  returns  i:integer 

Eie  cardinality(st.assoc)  < n a -defined(st,str) 

Bpst  st  - <st. block’,  st.assoc’  u {<str,st.block’,i>}>, 
lookup(st:symtab,str:T)  returns  x:integer 

Bosi  if  3 y ( st.assoc  st  [y.s-str  a Vz  < st.assoc,  z.s-str  o z.bl  i y.bl] 
then  x - y.ui 
else  x » 0, 

enter  block(st:symfab) 

Etosi  st  - <st. block’*  l,st. assoc ’>, 
leaveblock(st.symtab) 
pre  st.block  > 1 

Bosi  st  - <st.block’-l,  st.assoc’  - {<s,x,ui>  »t  xisf.block*}>, 
full(st:symtab)  returns  tiboolean 

Post  t - (cardinality(st. assoc)  - n); 

representation 

unique 

blvl:  integer, 

info:  vector (record(s:T,bl:integer),l,n), 
as:  condis(n,m) 

[nit  blvl  «-  1; 

refitas, info, blvl)  - <blvl,  {<info[i].s,info[iJ.bl,i>  | 3j  « [0..m-l]  st  mbr(as,j,i)}>; 
invariant 

(mbr(as,i,j)  o hash(info[j].;;,m)  - j) 

a <follows(as,i,j)  =>  blvl  > info[iJbl  i info[jJ.bl  i 1 a (info[i]-info[j]  a i-j)) 
Implementation 

defined  out  (t  - 3j  st  st.infoIiHstr.st.blvl>  a mbr(st.as,hash(str,m),j»  - 
first  j:indis(st.as,hash(s»r,m))  suchthat  st.info[jJ.s-str 
then  t «-  st.info[jJ.bl-st.blvl  el$£  t <-  false; 
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body  insert  in  -full(st.as)  a -de<ined(st,str) 

oui  (st.info[i]»<str,st.blvl>  a sqhash(str|m)  - <'>^d’hash<str(m))  " 
begin 

i «-  xtnd(st.as,hash(str,m)); 
st.info[i]  «-  <str,st  blvl>; 
end; 

body  lookup  gut  (x-0  =>  (j  < [l..n]  a 3i  i [0..m-l)(mbr(st.as,i,j))  o st.info[jJ.s  t str))  a 
(x>0  o st.info[x].s»str  a (st.info[j].s»str  o j-x  v st.info[x  J.bl  > st.info[j].bl))  - 
first  j:indis(st.as,hash(str,m))  suththat  st.info[j].s-str 
then  x «-  j else  x «-  0) 

body  enterblock  out  (st.blvl  ■ st.blvl’  + 1)  - 
st.blvl  «-  st.blvl-*- 1 ; 

body  leaveblock  in  st.blvl  > 1 

out  (st.blvl  - st.blvl’  - 1 A ()  < [l..n]  a i « [0..m-l]  o 

(mbr(st.as,i,j)  ■ mbr(st.as’,i,j)  a st.info[j].bl  < st.blvl’)))  - 

begin 

st.blvl  «-  st.blvl- 1 j 

for  i:  upto(0,m-l)  do  ! the  generator  upto  is  defined  in  [Shaw76b] 

first  j:indis(st.as,i)  suchthat  st.info[j].bl  s st.blvl 
then  del(st.a$,i,j)  else  delall(st.as,i)i 

end; 


body  full  ogt  (t  - SIGMAj^o  lengthfs.sqj)  ■ n)  - 
t «-  full(st.as)i 
endform 


Note  that  the  representation  of  a symtab  consists  of  three  objects:  (1)  blvL,  an  integer, 
is  a direct  representation  of  the  abstract  entity  block,  and  is  initialized  to  1.  (2)  info  is  a 
vector  of  records  which  hold  the  ’’thing'*  (usually  a string)  and  the  block  level  at  which  it  was 
declared.  Each  of  these  records  is,  in  effect,  one  of  the  triples  in  the  abstract  ’’assoc"  set;  the 
third  element  of  the  triple,  the  unique  integer,  is  not  explicitly  represented  — rather,  It  is 
implicitly  encoded  as  the  index  of  this  record  in  the  vector.  (3)  at  It  a condls,  and  as 
explained  above,  it  represents  a set  of  lists  of  indices  into  this  vector  of  records)  each  such 
list  is  uniquely  associated  with  a hash  function  value. 

A point  which  may  not  be  obvious  is  worth  noting.  It  is  rare  that  all  info  entries  will  be 
in  use)  we  thus  have  a potential  problem  in  maintaining  the  free  storage  of  this  vector.  This 
problem  is  handled  by  the  condis  abstraction.  The  uniqueness  of  the  integers  in  condis 
sequences  guarantees  that  no  info  entry  will  be  used  simultaneously  by  different  membe.  s of 
assoc.  In  essence,  the  integer  values  which  are  in  the  condis  sequences  correspond  to 
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occupied  entries,  and  all  other  integers  in  the  range  1 to  n correspond  to  unoccupied,  or  free, 
entries.  Specifically,  the  abstract  invariant  of  condis  and  the  post  condition  of  xtnd  together 
provide  a safe  allocation  of  new  info  entries.  Similarly,  del  and  detail  provide  a safe 
deallocation  mechanism. 

To  illustrate  the  operation  of  the  implementation,  consider  the  interaction  of  the  bodies 
of  insert  and  lookup.  When  a new  symbol  is  to-  be  inserted,  we  first  invoke  the  condis 
operation  xtnd.  This  has  the  effect  of  extending  the  head  of  the  sequence  associated  with  the 
hash  value  of  the  symbol  by  a new,  unique,  integer.  This  integer  is  then  used  as  the  index 
into  the  vector  info  and  the  symbol  and  current  block  level  are  recorded  in  this  entry.  When 
a later  lookup  is  performed  on  this  symbol,  the  indis  generator  is  used  to  find  the  first 
integer,  j,  ir.  the  sequence  associated  with  the  hash  value  of  the  symbol  for  which  "info[j].s" 
matches.  Since  xtnd  extends  the  sequence  at  its  head,  this  match  is  necessarily  the  most 
recently  declared  instance  of  the  symbol. 


Verification  of  the  form  Symtab 

A form  is  verified  by  proving  four  properties  as  described  in  [Wulf76a,b]  and 
summarized  in  Appendix  A.  As  promised  earlier,  the  verification  below  uses  only  the  abstract 
specification  of  the  form  condis,  including  the  auxiliary  predicates.  The  implementation  of 
condis  is,  as  desired,  irrelevant  to  symtab.  All  uses  of  the  generator  indis  satisfy  the 
independence  assumption  provided  that  in  leaveblock  we  regard  both  the  then  and  else 
clauses  as  being  outside  the  first  generator.^ 


For  the  form 

1.  Representation  validity 

Show:  Ic(as,info,olvl)  p Ia(rep(as,info,btv!)) 

Proof:  cardinality(assoc)  S n follows  from  Ia  for  condis,  namely,  lSe^Sn 
and  no  duplicate  e^’s  means  at  most  n elements  in  assoc.  The  relation 
lsuisn  holds  because  of  mbr  in  the  rejs  function  and  lse^Sn  in  Ia  for 
condis.  The  relation  lsblsblock  follows  by  setting  j-i  in  follows(as,i,j) 
in  Ic.  To  show  uniqueness  in  assoc,  first  note  that  identical  s and 


^ Strictly  speaking,  this  violates  the  definition  of  the  first  statement  in  [Shaw76b],  a 
definition  which  we  must  modify  to  permit,  for  example,  finalization  statements  and  the 
leaveblock  usage.  Wo  must  also  weaken  the  independence  assumption.  With  the  strict 
interpretation,  however,  an  ad  hoc  argument  shows  that  there  are  no  problems  in  this  case 
because  indis  does  not  modify  the  generated  sequence  and  no  further  generation  is  attempted 
after  the  then  and  else  clauses. 
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identical  bl  means,  letting  hash(tl.s,m)  - hasb(t2.s,m)  - k,  that 
mbr(a$,k,t  1 ui)  and  mbr(as,k,t2.ui),  whence  we  have  either 
follows(as,f  l.ui,t2.ui)  or  follows(as,f  2.ui,f  I.ui).  In  either  case,  since 
info[t  l.ui]"info[t2.ui],  then  t l.ui  “ t2.ui  as  required.  The  converse  of 
the  uniqueness  clause  holds  since  Ia  for  condis  means  no  duplicates. 

2.  Initialization 

Show:  nil  a mil  { blvl*-l  } <1,{}>  - rep(as.into.blvl)  a Ic 
Proof:  This  holds  since  initially  of  condis  says  each  sqj-<>,  i.e.,  -’mbr(as,j,i) 
and  -follows(as,i,j).  Note  that  nil  a mil  permits  the  declaration 
as:condis. 

For  the  function  defined 

3.  Concrete  operation 

Show:  !c  { first  j:indis(st.as,hash(str,m))  suchthat  st.info[j].s-str 
then  t «-  st.inf o[j J.bl— st.blvl  else  t«-false  } a Ic 
Proof:  Ic  holds  since  it  is  unchanged.  Indis  may  be  called  since 
Oshash(str,m)<m.  By  the  first  term  of  Ic,  str  can  only  be  located  from 
sqhash(str  m)-  ^or  the  then  clause,  the  second  term  of  Ic  gives  /30u|. 
(Note  that  mbr(st.as,hash(str,m),j)  holds  by  the  definition  of  indis.)  For 
the  else  clause  str  was  not  located  from  s%ash(str,m)>  whence  t is 
false  as  required. 

4a.  /!?jn  holds 
/?in  is  true 

4b-  ^post  holds 

Show:  Ic  a a t « 3i  ft  <str,st.block’,i>  < st. assoc 

Proof:  If  t is  true  in  /?out,  then  <st.infofj}.s,  sI.info(j).bl,j>  - <str,st.block\i> 
i st. assoc,  i.e.,  choose  i to  be  j.  If  t is  false  in  /?ou{,  there  will  be  no  i 
and  t is  false  as  required. 

F or  the  function  insert 

3.  Concrete  operation 

Show:  /ljn  a Ic  { i*-xtnd(st.as,hash(str,m))i  st.info[i]«-<str,st.blvl>  } /4ouj  A Ic 
Proof:  The  pre  of  xtnd  holds  because  hash(str,m)  < [0..m-l]  and  because 
-full(st.as)  means  cardinality(st. assoc)  < n whence  the  SIGMA  term  < n. 
The  first  term  of  /?ouj  is  clear.  Since  the  hash(str,m)*b  sequence  of  as 

is  extended,  sqhash(s,rm)  - <i>~sqhash<str,m)  wh«r8  « is  ,h® 
appended  new  element.  The  first  term  of  Ic  follows  by  the  call  to 
xtnd  and  st.info[i}.s-strj  the  second  term  of  Ic  follows  by  Ic  and 
-defined(st.str),  i.e.,  str  is  not  defined  at  the  current  block. 

4a.  fitn  holds 

Show:  !c  A cardinality(st.assoc)<n  A -defined(st,str)  a /?jn 
Proof:  cardinality(st.assoc)  < n means  -fulKst.as). 
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4b  ^post  holds 

Show:  lc  a flpre  a /tout  = /tpos, 

Proof:  The  new  triple  <st.infot»is,st.infoLiJ.bl,i>  is  added  to  st.asaoc. 

F or  the  function  Lookup 

3.  Concrete  operation 

Show:  Ic  { first  j:indis(st.as,hash(str,m))  suchthat  st.info[j].s-str 
then  x«-j  else  x*-0  } /?ou(  a I£ 

Proof:  lc  is  unchanged.  As  in  the  operation  defined,  str  can  only  be 
located  from  sqhasfl(s|r>m^  By  indis,  j < fl..nj.  Hence  only  the  else 
clause  makes  x-0  and,  as  required  in  this  case,  j < [l..n]  A 3i  f 
[0..m-l  J(mbr(st.as,i,j»  = st.info[j].s»<str.  For  the  then  clause,  the  first 
term  after  x>0  holds  by  the  suchthat  clause.  For  the  second  term 
after  x>0,  suppose  jr<x.  Using  the  second  term  of  Ic  (note  that 
follows(st.as,x,j)  holds)  rules  out  the  possibility  that 
st.info[xJ.bl-st.info[j].bl  since  otherwise  j-x.  Hence  st.info[x].bl  > 
st.info[j].bl. 

4a.  fltn  holds 
fi\n  is  true 

4b-  ^post  holds 

Show:  Ic  A /?ou,  d /?p0st 

Proof:  x*0  means  -3y  st  y.s-str.  x>0  means  x ■ y.ui,  i.e.,  y • 
<st.info[j].s,st.info[j].bl,j>. 

For  the  function  enterblock 

3.  Concrete  operation 

Show:  Ic  { st.blvl  *•  st.blvl*  1 ) /tfouj  a Ic 

Proof:  ^Qu(  is  clear.  Since  st.blvl  increases,  Ic  still  holds. 

4a.  /?jn  holds 
flin  is  true 

4b-  ^post  h01^ 

Show:  Ic  a /lou,  o /?post 

Proof:  st.block  - st.blvl  ■ st.blvl'*  1 - st. block’*  1 and  st.assoc  ■ st.asaoc’. 

For  the  function  Is  ewe  block 

3.  Concrete  operation 

Show:  A Ic  { body  } /!?out  A lc 

Proof:  st.blvl  ■ st.blvl’- 1 is  clear.  By  the  statement  each  sqj  for  I < 
[0..m-l]  is  adjusted  by  the  firjtl  statement.  For  each  of  India,  del,  and 
delall,  we  have  the  aia  condition  1 « [0..m-l]  by  the  [at  statement. 
The  other  part  of  atfi.  of  del,  mbr(st.as,l,j),  holds  by  India.  In  the  then 
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case,  del(st.as,i,j)  deletes  all  entries  in  sq,  up  to  but  not  including  j. 
Because  j is  the  first  j with  st.infol j J.bl<st.blvl<st.blvr,  the  block  level 
ordering  asserted  by  Ic  ensures  /?ouj.  In  the  else  case  all 
st.infoljj  bl>st.blvl  whence  sq(  should  become  <>,  which  detail  does. 
/tout  I°,,ows  since  st.inf o[ j J.bl<st.blvr  ■ -mbr(st.a$,i,j).  In  both  the  then 
and  else  cases,  Ic  still  holds  because  the  lists  only  get  shorter  and 
st.blvl>l  on  entry. 

4a.  /tjn  holds 

Show:  lc  a st. block  >1  o st.blvl>l 

Proof:  In  the  rej>  function,  st. block  and  st.blvl  correspond 

4b  impost  holds 

Show:  lc  A /tpre  A /?0UJ  =>  /Ipost 

Proof:  Since  st.blvl-st.blvl’-l,  we  have  st  block-st.block’-l  as  required.  By 
/tout  and  the  rej>  function,  st.assoc-st.assoc’  - {<s,x,ui>  it  xZst.block’j. 

For  the  function  full 

3.  Concrete  operation 

Show:  lc  { t«-full(st.a$)  } /?ouj  a Ic 

Proof:  /tQuf  is  exactly  the  post  condition  of  full  in  condis.  lc  is  unchanged. 

4a  /? jn  holds 
/^n  is  true 

4b-  ^post  ho,ds 

Show:  Ic  a /?Qut  = /?pos, 

Proof:  t - (SIGMA  ( ( jq  j lengthts.sq^)  - n)  - (cardinality(st. assoc)  - n). 

QED 
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As  discussed  earlier,  the  abstract  representation  of  condis  is  a set  of  precisely  m 
sequences  of  integers.  The  integers  in  these  sequences  are  all  in  the  range  1 to  n,  and  a 
particular  integer  appears  at  most  once  in  some  sequence. 

As  one  might  expect,  the  sequences  will  be  represented  by  singly  linked  lists.  In  fact 
we  shall  use  an  integer  vector,  It  (for  link-table),  to  store  all  of  the  lists  which  represent 
sequences  in  a condis.  The  fact  that  an  index  i into  It  is  in  the  k*b  position  of  such  a list  will 
represent  the  fact  that  i appears  in  the  k*b  position  of  the  corresponding  abstract  sequence. 
A separate  vector,  *q,  of  length  m,  is  used  for  the  heads  of  the  lists.  In  ell  cases,  zero,  which 
is  not  a legal  condis  sequence  element,  is  used  to  indicate  the  end  of  a list;  thus,  in  particular, 
if  sq[j]"0,  the  jtb  condis  sequence  is  empty.®  A separate  list  of  those  integers  which  are  not 
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currently  members  of  any  sequence  is  also  maintained,  and  the  head  of  this  list  is  maintained 
in  the  simplo  variable  froe.  The  following  diagram  illustrates  one  possible  configuration  of  a 
condis  object  which  has  been  declared  with  m-3  and  n»10: 


form  condio(n,m:integer)  - 

beeinform 

specifications 

requires  nil  a mil; 

let  condis  * L:{sqj:<ejj,ej2.  . • • i ejn.>  I OsiSm-1  A e^  u integer}; 
invariant  lse^sn  a Vi,J  < [©..m-l^em  «ejg  3 A *'l**'2^ 
initially  Vi  f [0..m-l]  sqj  - <>; 
functions 

xtndfs:condis,i:integer)  returns  j:integer 

pre  i < [0..m-l]  a SIGMA^q  m_jj  length(s.sqj)<n, 

post  s.sqj  - <j>^s.sqj’,  ! note  j is  a new  value  not  in  any  aq  (by  Ia) 

del(s:condis,  i,j:integer) 

pre  s.sq(  - < . . . , j,  ...  > a i«[0..m-l] 
post  s.sqj  - <},  ...  >, 
detetl(s:condis,i:integer) 
pre  I < [0..m-l] 
post  s.sqj  - <>, 

full(sxondis)  returns  t:boolean 

post  t - SlGMAjtjQ  m_i  j lengthfs.sqj)  - n; 


5 We  can  now  explain  why  the  function  detail  is  not  redundant.  The  knowledge  that 
zero  ends  a list  is  private  to  condis,  and  therefore  it  is  not  known  in  symtab.  Hence,  In  the 
body  of  leaveblock  of  symtab,  the  operation  "delall(as,i)"  cannot  be  replaced  by  "del(ea,l,0)". 
To  do  so  would  violate  the  erg.  condition  of  del  because  if  j is  a member  of  sqj,  it  means  Jfcl. 
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generator  indis(s:condi3,i:integer)  extends  x:integer 
requires  Osi<m-l 
let  irtdis  - s.sqj  where  indis^o  p 

(indis  - c~<x>~d  and  c,  <x>,  and  d are  disjoint); 
rule  ford,  x,  <s,i>,  ST)  - 

premise  s.sq|-c^x>'>tt  a Kc)  {ST}  I(c~<x>); 

□Jle  first(P,  x,  <s,i>, /?,  S1(  S2.  0) - 

premise  s.sqj-c'<x>'-d  a P a Vy  ( c(-/3(y » a ft(x)  {Sj}  Q, 
premise  P a Vy  ( s.sqj-ytf(y)  {S2}  Qi 
auxiliary  predicates 

follows(s:condis,i,j;integer)  3K  *t  sq^  -<...,  i, ...,  j,  ...  >, 
mbr(s:condis,i,j:integer)  sqj  ■<...,  j, ...  >; 

representation 

ypjq.yg 

sq:  vector(integer,0,m-l ), 

It:  vector(integer,l,n), 
free:  integer 

inii  begin  tree  *-  1;  lor  i:upto(l,n-l)  dfi  lt[i]  i+1;  lt[n]  «-  0; 
for  i:upto(0,m-l)  do  sq[i]  «-  0 end: 
rep(sq.lt.free)  - {SQj  | OsiSm-1}  where 
jf  sq[i]  - 0 then  SQj  ■ <>  else 

if  sq[i]  “ p j a (Vj  < [1..K-1]  lt[pj]-Pj*i)  a lt[pK]-0  {hen  SQj  - <pt pK>; 

invariant 

0 s free  i n 

a Vj  ( [0..m-l]  0 i sqfj]  i n 
A VK  ( [l..n]  0 S lt[K]  £ n 

a {free,  sq[j],  lt[K]}  - {m+1  0’s,  1,  2, ....  n}  ! this  term  is  a multiset  equality 
A VI  i [l..n](succ(free,i)  xor  3!j(succ(sq[j],i») 

where  succ(i,j)  adf  i-j  v (iyO  co nd  succ(lt[i],j)); 

implementation 

body  xtnd  jn  s.free»<0  a i ( [0..m-l] 

out  (succ(s.free’.j)  a succ(s.sq[ilj)  A s.sq[i]-j  a s.lt[j]  - s.sq'[i])  - 
beRin 

j «-  s.free;  s.free  ♦-  s.lt[j) 
s.lt[j]  x-  s.sqtil  s.sq[i]  j; 
end: 
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Implementation  of  the  form  Condis 


body  del  m succ(s.sq[i],j)  a i < [0..m-l]  A j < [0..n]  out  (s.sq[i]-j)  - 
if  s.sq[i]»<j  then 

begin  local  krinteger; 
k ♦-  s.sq[i]-, 

while  s.lt[k]  / j do  k «-  s.lt[k]; 

s.lt[k]  «-  s.free;  s.free  *-  s.sqfi];  s.sq[i]  «*  j; 

end; 


body  detail  in  i ( [0..m-l]  out  (s.sq[i]-0)  ■ 

s.del(s,i,0);  ! a call  to  the  concrete  body  del,  not  the  abstract  function  del 

body  full  out  (t  - (s.free-O))  - 
t *-  s.free-O; 

formbodv  indis  - 

beginform 

representation 

repfs.sa.s  It.i.x)  - 

if  s.sq[i]  ■ 0 then  <>  else 

if  x - 0 then  c~d  where  c - s.sqj  and  d ■ <>  else  c~<x>~d 

c - <pj Pr.]>,  x-pr>  d - <pr+1, . . pk>, 

Pi  “ s.sq[ij,  s.lt[pk]  - 0,  and  (Vj  < [1..K-1J  s.ltfpj]  - P|+1>; 
invariant  true; 
implementation 

body  Ainit  Out  (x-s.sq[i]  A {&b  ■ s.sq[i]yO))  - 
(x  «-  s.sq[ifc  &b  «-  xKO); 

body  Anext  in  succ(s.sq[i],x)  a xi<0  qut  (x»s.lt[x’]  A (&b  » s.lt[x’]»«0))  - 
(x  ♦-  s.lt[x]j  #b  «-  x^O); 
endform 

endform 


The  implementation  of  the  four  operations  in  condis  should  be  fairly  obvious,  xtnd 
merely  removes  an  entry  from  the  free  list  and  places  it  at  the  head  of  the  appropriate  list) 
note  that  this  entry  is  returned  (in  j)  as  the  value  of  function  xtnd.  d*l  is  a bit  more 
interesting.  It  searches  the  appropriate  list  for  the  entry  in  It  which  points  to  the  first  entry, 
i,  which  is  nof  to  be  removed.  It  then  moves  the  entire  initial  portion  of  the  list  to  the  free 
space  list  by  simply  setting  the  proper  pointers.  If  all  the  entries  are  to  be  removed,  dxtaU 
does  thi6i  it  calls  del  to  search  for  the  list-ending  zero  and  to  move  the  entire  list  to  the  free 
space  list,  full  just  tests  if  the  free  space  list  is  empty. 
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The  predicate  »uec  defined  in  the  concrete  invariant  is  closely  related  to  the  abstract 
predicate  follows.  Although  the  parameterizations  of  the  two  predicates  are  different,  they 
ask  the  ’’same”  question  and  are  related  by 

follows(rep(sq,lt,free),  i,  j)  - succ(i,  j) 


The  form  indis(s,i)  defines  a generator  for  elements  of  the  integer  sequence  Sj,  starting 
with  first($j).  Abstractly,  an  indis  is  composed  of  three  (subsequences,  the  first  containing  the 
elements  already  generated,  the  second  the  (singleton)  current  element,  and  the  third  the 
other  elements  yet  to  be  seen. 

In  [Shaw76b]  we  discussed  the  proof  rules  for  iteration  statements.  We  showed  that 
certain  simplifying  assumptions  about  the  generator  can  yield  simple  proof  rules;  these 
assumptions  are  satisfied  by  indis,  as  we  will  show  in  the  verification  of  condis.  We  therefore 
have  a proof  rule  for  the  for  statement  which  corresponds  closely  to  Hoare’s  sequence  rule 
and  also  a proof  rule  for  the  first  statement.  These  proof  rules  are  given  in  the  specifications 
of  indis,  and  indeed  constitute  the  major  part  of  those  specifications.  The  basis  for  this 
specification  technique  for  generators  is  given  in  [Shaw76b]. 


Verification  of  Condis 


We  can  now  verify  the  form  condis. 


F or  the  form 

1.  Representation  validity 

Show:  lc($q,lt,free)  a Ia(rep(sq,lt,free)) 

Proof:  lsa^n  holds  by  the  bounds  on  sq[j]  and  lt[k]  and  the  fact  that  the 
rep  function  drops  the  zeroes  that  indicate  the  end  of  a list.  The  e^’s 
are  distinct  because  the  multiset  {sq[j],  lt[k]}  contains  each  of  1,  2, .... 
n at  most  once.  The  multiset  property  of  Ic  implies  succ(free,0)  and 
succ(sq[j],0). 

2.  Initialization 

Show:  nil  a mil  { jnil } Vi  « [0..m-l]  sqj-<>  a Ic 

Proof:  After  mil  have  free-1,  lt[l]«2 It[n-l)«n,  lt[n]-0,  sq[0}-0,  .... 

sq[m-l]-0.  Using  the  tfifi.  function,  each  sqj-o  since  each  sq[i]«0. 
nil  means  Osf reeSn.  The  bounds  on  sq[j]  and  lt[k]  and  the  multiset 
property  are  clear.  VI  i [l..n](succ(free,i)  a -succ(0,i)). 
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F or  the  function  rtnd 

3.  Concrete  operation 

Show:  s.free^O  a j < [0..m-l]  a !c  { body  ) /?out  a !c 

Proof:  The  four  terms  of  are  clear  as  are  the  bounds  in  Ic.  The 
multiset  property  holds  because  the  body  permutes  the  values  s.free', 
s.sq  [t],  and  s.lt’[s.free’].  Since  the  head  of  s.free  moves  to  the  head  of 
s.sq[i],  each  i c [l..n]  still  satisfies  exactly  one  succ  term.  /3jn  (and  lc) 
ensures  that  the  accesses  to  s.lt  and  s.sq  are  within  bounds. 

4a.  /Jjn  holds 

Show:  !c  A /Jpre  p flin 

Proof:  i c I0..m-lj  is  immediate.  If  s.free-O,  then  the  multiset  property  of 
lc  means,  using  the  rep  function,  that  the  SIGMA  term  is  exactly  n,  a 
contradiction.  Hence  s.free^O. 

4b  ^post  h®'115 

Show:  Ic  a /30ut  a p Sqj-<j>^sqj’ 

Proof:  Since  s.sqp)»j  and  s.lt[j]-s.sq’[i],  the  rep  function  gives  sqj-<|>~sq|\ 
F or  tha  /unction  del 

3.  Concrete  operation 

Show:  /?jr|  a Ic  { body  } s sq[iJ«j  a Ic 

Proof:  If  s.sq[ij-j  then  /?ou,  holds  and  Ic  is  unchanged.  If  s.sqfiVj  then 
define  the  set  Gp  - { x | succ(s.sqli],x)  a succ(x,p)J.  Add  the  ghost 
operation  "H-H  u {k}"  after  "k-s.lt[k]"  in  the  whijp  loop  and  add 
fh-(kj  after  "k*-s.sq[i]".  A while-loop  invariant  (placed  before  the 
test)  is  then  H-Gk  because  Gs  sq^ j - {s.*q[ij}  and 

H-Gk  a s.lt(k]i<j  p H u {s.ltfk]}  - Gs  |t[R] 

The  while  terminates  because  succ(s.sq[i],j)  and  s.sq[i]i<j.  At 
termination  s.lt[k]~j  and  H-G^.  The  multiset  property  of  lc  holds 
because  the  last  three  statements  in  the  body  permute  the  values 
s.free’,  s.sq’fi],  and  s.lt’[kj.  Furthermore,  each  element  in  H is  now  a 
successor  of  s.free  rather  than  of  s.sqfij.  All  other  successors  of 
s.sq[ij  and  all  previous  successors  of  s.free  remain  so,  respectively. 
/*out  arKl  ,be  bounds  in  Ic  are  clear. 

4a.  flln  holds 

Show:  Ic  a /?pre  p /tjn 

Proof:  Immediate  from  /?  and  I for  condis. 

4b-  ^post  h®1* 

Show:  Ic  A a /tQut  a sq;-<  j, . . . > 

Proof:  Only  sqj  changes,  sq;  now  begins  with  j and  there  are  no  other 
changes  to  sq{. 
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F or  the  function  detail 

3.  Concrete  operation 

Show:  /?jn  a !c  { s.del(s,i,0)  } s.sq[i]-0  a 1c 

Proof:  and  the  multiset  property  of  I£  imply  in  holds  for  s.del.  (Ic 

holds  for  s.del  as  required.)  The  gul  for  s.del  gives  s.sq[i]-0.  l£ 
after  s.del  gives  Ic  after  delall. 

4a.  /?jn  holds 

Show:  i ( [0..m-l]  o i < [0..m-l] 

Proof:  Immediate 

4b-  ^post  holds 

Show:  !c  a i < [0..m-l]  a s.sq[i]-0  o sqj-<> 

Proof:  Only  sqj  changes.  s.sqfi]-0  means  sqj-<>. 

F or  the  function  full 

3.  Concrete  operation 

Show:  Ic  { t«-s.free-0  } t - (s.free-O)  a I£ 

Proof:  Immediate 
4a.  /3in  holds 
/tjn  is  true 

4b-  ^post  bo,ds 

Show:  l£  a /touf  o /tp0st 

Proof:  t - (s.free-O)  - (SIGMA  . . . - n)  using  the  multiset  property  of  I£. 

To  verify  the  indis  generator,  we  must  first  reconstruct  the  gr£  and  post  conditions 
from  the  specified  proof  rules: 

Ainit 

post  (Ab  a s.sqji«>)  a (Ab  ox-  first(s.sqj)  a c - <>) 

A next 

ore  mbr(s,i,x) 

post  (Ab  a dV<>)  a (Ab  ox-  first(d’)  a c - c’-^x^) 

Next,  we  must  show  that  indis  satisfies  the  standard  aggregate  assumptions: 

(a)  The  indis  abstraction  is  explicated  in  terms  of  sequences.  The  normal  empty 
sequence  (<>),  concatenation  operator  (-),  and  leading  element  selector 
(first)  are  available. 


(b)  The  complete  sequence  to  be  generated  is  s.sqj,  which  can  be  decomposed  as 
indicated  in  the  |si  clause  of  indis. 
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Verification  of  Condis 


(c)  The  specifications  of  Ainit  and  Anext  have  the  required  form. 

Furthermore,  indis  satisfies  the  basic  generator  assumptions  because  (a)  Ainit  and  Anext 
terminate  and  (b)  Ainit  and  Anext  alter  only  the  indis  variable  x (and  the  return  value  Ab). 

Since  "sq",  "It",  and  "free"  are  unchanged  by  indis,  the  Ic  of  condis  still  holds  and  will  be 
used  in  the  proof. 

F or  the  form  (indis) 

1.  Representation  validity 
Show:  lc  3 Ifl,  i.e.,  true  o true 
Proof:  Immediate 

2.  Initialization 

Show:  Osism-l  { } true  a true 
Proof:  Immediate 

F or  th«  function  Ainit 

3.  Concrete  operation 

Show:  true  { x«-s.sq[ii  Ab*-xH0  } x-s.sq[i]  a (Ab  ■ s.sq[i]f*0) 

Proof:  Clear 
4a.  flin  holds 
is  true 

4b.  /3post  holds 

Show:  x-s.sq[i]  a (Ab  « s.sq[i]HO)  a 

(Ab  a s.sq/o)  a (Ab  ox-  first(s.sqj)  a c - <>) 

Proof:  From  the  re£  function  for  indis,  s.sqj  - (if  s.sq[i]-0  then  <>  else 
some  ncn-empty  sequence).  Hence  Ab  ■ s.sqliJ^O  ■ s.sq^o.  For  the 
second  term  of  the  conclusion,  assume  Ab.  Then  x-$.sq[i)r<0  and  the 
final  clause  of  [2£  gives  s.sqj  - c~<x>~d.  Since  x-s.sq[i]-p},  then  c - 
<>  whence  also  x - first(s.sqj). 

For  the  function  Anext 

3.  Concrete  operation 
Similar  to  Ainit.3 
4a.  /4jn  holds 

Show:  mbr(s,i,x)  o succ(s.sq[i^x)  A x»<0 

Proof:  mbr(s,i,x)  means  x»«0  by  Ig  for  condis.  The  term  tucc(s.tq[ilx) 
follows  from  mbr(s,i,x),  the  function,  and  the  definition  of  succ. 

4b.  /?p0#t  holds 

Show:  mbr(s,i,x’)  a x-s.lt[x’]  a (Ab  a s.lttx’>0)  o 
(Ab  a dV<>)  a (Ab  ax-  first(<f)  Ac-  c’-^x^) 
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Proof:  mbr(s,i,x’)  means  xVO  and  s.sq^o,  and  therefore  by  the  rep 
function  also  s.sq[ij^0.  Hence  in  the  final  clause  of  the  rep  function, 
Sib  ■ s.lt[x’]»<0  ■ dV<>.  For  the  second  term  of  the  conclusion,  assume 
&b.  Then  x-s.lt[x’><0  and  the  final  clause  of  re£  gives  s.sqj-c~<x>~d 
and,  because  xVO,  also  s.sqj“C,~<x’>'*,<d'>.  Since  x-$.lt[x’],  It  follows 
that  x * first  (d’)  and  c ■ c**<x*>. 

QEO 


Examples  of  the  Use  of  Symtab 


In  this  section  we  shall  present  a skeletal  example  which  involves  three  different  styles 
of  usage  of  the  symtab  abstraction.  It  is  not  our  intent  either  to  make  this  example  complete 
or  to  suggest  that  the  utility  of  the  abstraction  is  limited  to  these  three  cases.  Rather,  we 
wish  to  bolster  the  reader’s  intuition  about  ways  in  which  the  abstraction  might  be  used. 

The  example  we  have  chosen  is  a multi-pass  compiler  for  an  Algol-like  (i.e.,  block- 
structured)  language,  and  indeed  we  have  restricted  ourselves  to  the  first  two  passes  — 
lexical  and  syntactic  analysis,  respectively.  In  this  scheme,  the  first  pass  is  responsible  for 
reading  units  of  the  source  file  (identifiers,  literals,  punctuation  marks,  etc.)  and  converting 
them  to  an  internal  form  called  a "lexeme".  These  lexemes  are  written  onto  a file  which  will 
be  read  again  by  the  second  pass.  The  second  pass  is  responsible  for  reading  the  file  of 
lexemes  generated  by  the  first  pass  and  performing  syntactic  analysis.  Although  it  is  not 
important  to  our  example,  the  output  of  the  second  pass  will  likely  be  some  other  intermediate 
representation  (e.g.,  reverse  polish  or  trees)  which  is  suitable  for  optimization  and  code 
generation. 

Here,  then,  is  the  skeletal  program!  more  detailed  comments  on  the  uses  of  the  symtab 
abstraction,  and  on  the  program  in  general,  follow  the  example. 

function  compiler  (source:  file(chtr))- 
beein 

form  condis  . . 
form  symtab  . . 
form  id  extends  string- 
beginform 
specifications 

function  hash  (s:id,  m:integer)  returns  k:integer  erg  m>0  post  Osk<m’j 
endform; 
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form  lex  extends  integer- 
beeinform 
specifications 

function  hash  (x:lex,  m:integer)  returns  k:integer  pr£  m>0  post  Osk<m’; 
endform; 
local  L:  file(lex); 
begin  ! pass  1 

local  NT:  symtab  (id,  127,  1000); 

! 

! pure  lexical  pass,  see  discussion  below. 

• 

end; 


begin  ! pass  2 

form  attributes  "...  ! see  discussion  below 

local  A:  vector  (attributes,  1,  2000); 
local  ST:  symtab  (lex,  127,  2000); 

j 

! syntactic  (parse)  analysis  pass;  see  discussion  below. 


end; 


This  program  first  defines  four  forms.  Symtab  and  condis  have  been  defined  in  detail 
previously  and  hence  are  not  repeated.  The  forms  id  and  Ux  are  extensions  of  strings  and 
integers,  respectively,  and  merely  add  hashing  functions;  we  have  not  defined  the 
implementations  of  these  functions,  since  they  are  not  germane  to  the  example.  Note  too  that 
a file  of  Isxo s is  defined  at  the  outermost  blocK  level;  this  file  is  the  explicit  interface  between 
the  first  and  second  passes. 

As  noted  earlier,  the  function  of  the  first  pass  is  to  convert  the  external  representation 
of  the  program  (a  file  of  characters)  into  a more  convenient  internal  form  — namely  a file  of 
lexemes  (where  each  lexeme  represents  an  atom  of  the  language).  Since  this  pass  does  no 
syntactic  analysis,  in  particular  it  does  not  recognize  block  structure.  This  implies  that  all 
occurrences  of  the  same  atom  (e.g.,  "xyz")  will  be  mapped  to  the  same  lexeme.  This  mapping 
is  accomplished  through  the  use  of  the  NT  (for  name-table)  instantiation  of  symtab;  indeed,  the 
only  use  of  NT  is  to  obtain  this  unique  mapping  and  the  instantiation  Is  therefore  deleted  on 
exit  from  the  block  in  which  the  first  pass  is  accomplished. 
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In  skeletal  form,  the  body  of  the  block  for  pass  1 might  look  somewhat  as  follows: 

open(source);  open(L); 
while  -«nd  of  file(source)  do 
begin 

local  hid,  xilex; 

! do  whatever  is  appropriate  to  assemble  the  next  atom 
! from  the  source  file  into  V. 

; 

if  (x^lookuptNT.OJ-O  a ■’full(NT)  then  x«-insert(NT,i); 
write  (L,x); 
end; 

rewind(L); 


Note  that  the  operations  enterblock  and  leavebLock  are  not  used,  all  insert  operations 
are  done  at  the  same  block  level,  and  only  one  entry  per  atom  will  be  made. 

The  second  pass  is  substantially  more  complex  since  it  performs  the  full  syntactic 
analysis;  hence  we  will  not  even  attempt  to  illustrate  its  skeletal  form.  We  would,  however, 
like  to  point  out  several  things  about  it. 

First,  notice  that  this  block  defines  a form  named  attributes  We  have  not  shown  the 
body  of  this  form,  since  it  will  be  highly  language-  and  machine-specific.  However,  the  notion 
is  that  this  form  provides  for  the  storage  and  manipulation  of  whatever  information  must  be 
retained  about  a symbol,  e g.,  its  type,  run-time  storage  address,  array  bounds,  and  so  forth. 

Second,  we  have  declared  a vector,  A,  of  these  attribute  objects.  As  suggested  in  an 
earlier  section,  instances  declared  at  a given  block  level  will  be  associated  with  a unique 
integer,  but  this  integer  will  be  different  from  the  one  associated  with  the  same  identifier 
declared  at  a different  block  level.  These  integers  will,  in  turn,  be  used  as  indices  into  the 
vector  A (e  g.,  to  set  and  retrieve  information  about  the  identifier). 

Finally,  we  have  declared  another  instantiation  of  symtab  ST.  This  one  will  be  used  to 
recognize  block  structure,  and,  specifically,  will  map  from  the  simple  lexemes  generated  in  the 
first  pass  into  indices  info  the  vector,  A,  of  attributes.  As  the  parser  detects  blocks  (begin- 
end  pairs)  in  the  source  program,  it  will  invoke  enterbLock  and  leaveblock.  The  declaration 
processing  routines  will  invoke  defined  to  determine  whether  an  identifier  has  been  declared 
twice  at  the  same  block  level  (presumably  an  error),  and  perform  <.ru«rt  operations  to  define 
the  instances  of  the  identifier  at  the  current  block  level.  The  rest  of  the  compiler  will  perform 
lookup  operations  to  obtain  the  index  of  the  attribute  vector  entry  associated  with  specific 
lexemes.  (Note,  by  the  way,  that  by  appropriate  ordering  of  insert  and  lookup  operations  the 
declaration  processor  can  obtain  either  of  the  interpretations  of  "block-structure"  discussed  in 
the  introduction.) 
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Before  leaving  this  example,  let  us  return  to  the  form  attributes  (defined  in  pass  2)  to 
illustrate  another  potential  use  of  the  symtab  abstraction.  As  was  mentioned  in  the 
introduction,  in  general  the  mapping  from  identifier  to  unique  integer  may  be  context- 
sensitive.  Block  structure  is  the  most  familiar  form  of  such  sensitivity,  but  another  is  name 
qualification,  as  in  field  selectors  for  records  In  many  languages  one  makes  a declaration  such 
as 


x:record(name:string,  age:integer,  z:integer); 

and  then  refers  to  "x.name",  "x.age",  and  ”x.z".  A problem  arises  when,  at  the  same  block 
level,  there  is  another  declaration  such  as 

y:record(ss:integer,  z:boolean); 

In  such  a case  the  identifier  "z"  is  no  longer  unique  --  its  interpretation  depends  upon  the 
name  it  qualifies. 

There  are  many  ways  one  might  treat  this,  including  inserting  each  of  V,  "x.name", 
"x.age",  "x.z",  "y",  "y.ss",  and  "y.z"  as  complete  identifiers  in  ST.  An  attractive  alternative, 
however,  is  to  include  instantiations  of  symtab  in  each  of  the  attributes;  that  is,  to  make  form 
attributes  appear  somewhat  as  follows: 

form  attributes- 
beginform 

representation 

unique  qual:symtab(lex,l,10), 

endform; 


If  this  is  done,  then  to  determine  the  interpretation  of  "x.z"  one  would  first  search  ST 
for  the  index,  i,  associated  with  the  lexeme  for  "x",  then  search  A[i}qual  for  the  index 
associated  with  the  lexeme  for  "z". 

Although  this  compiler  example  has  been  sketchy,  we  hope  that  it  has  suggested  some 
of  the  ways  in  which  the  symtab  abstraction  may  be  applied.  The  details  of  the  example  are 
not  important,  except  insofar  as  they  help  the  reader’s  intuition;  what  is  important  is  the 
notion  that  well-chosen  abstractions  have  many  uses.  The  class  of  broadly  useful  abstractions 
is  simply  too  large  to  include  them  all  in  a single  programming  language  --  hence  Alphard  has 
chosen  1o  provide  a linguistic  facility  so  that  the  programmer  may  define  them.  Many  such 
(verified)  abstractions  will  find  their  way  into  the  library,  and  hence  incrementally  enhance  the 
"power"  available  to  the  programmer  --  without,  at  the  same  time,  limiting  him  to  the  language 
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designer’s  preconceived  notions  of  wbat  constitutes  an  appropriate  set  of  abstractions  (or,  for 
that  matter,  implementations). 


Conclusions 

A programming  language  is  a tool  for  the  construction  and  communication  of  programs; 
as  such  its  utility  should  be  measured  relative  to  these  tasks  In  other  words,  the  language 
should  be  used,  and  the  quality  of  that  use  must  be  judged  While  this  Is  true  of  any 
programming  language,  it  is  especially  so  of  one  such  as  Alphard,  which  departs  substantially 
from  those  in  common  use 

Thus,  in  this  and  other  reports  we  are  attempting  to  exhibit  Alphard  in  relatively 
realistic  contexts  and,  along  with  the  reader,  to  judge  the  practical  utility  of  our  creation.  It  is 
far  too  soon  to  draw  definitive  conclusions  --  that  must  await  the  use  of  Alphard  in  real 
programs  — but  we  would  like  to  share  some  of  our  impressions  resulting  from  these 
experiences. 

First,  the  symtab  abstraction  is  about  the  (conceptual)  size  we  envision  for  most 
abstractions;  larger  programs  will  be  constructed  by  further  "layering".  Thus  we  take  our 
ability  to  specify  and  verify  this  form  as  fairly  strong  evidence  that  larger  programs  will  also 
be  tractable. 

Second,  in  most  respects  the  implementation  is  a practical,  efficient  one.  This  reinforces 
our  intuitions  that  no  efficiency  need  be  sacrificed  to  obtain  clear,  verifiable  programs.  (The 
one  exception  to  this  statement  is  our  use  of  fixed-sized  vectors  and,  correspondingly, 
integers  for  the  unique  identification  of  symbols.  A more  realistic  implementation  would, 
perhaps,  have  done  true  dynamic  storage  allocation  and  used  references.  We  avoided  this 
implementation  primarily  because  it  would  have  carried  us  into  portions  of  Alphard  not 
covered  in  previous  reports,  but  also  because  those  portions  of  the  language  are  still  in  flux. 
We  trust  that  the  reader  will  forgive  this  departure  from  realism.) 

Third,  one  of  the  anticipated  advantages  of  an  Alphard-like  language  is  that  a library  of 
verified  abstractions  will  develop.  Both  of  the  forms  developed  here  might  well  go  into  that 
library  so  we  are  getting  some  evidence  that  this  hoped-for  advantage  will  be  realized. 

Fourth,  one  of  our  private  objectives  was  to  make  the  form  mechanism  strong  enough  to 
support  an  extremely  broad  class  of  abstractions  --  the  ultimate  target  being  the  spectrum 
covered  by  our  intuitive  notion  of  the  word  "abstraction”.  The  evidence  is  not  conclusive,  but 
we  are  feeling  botter  about  meeting  that  goal  all  the  time. 

Finally,  we  should  say  a few  words  about  our  experience  concerning  the  effort  needed 
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Conclusions 


to  define  a form  If  should  be  clear  that  the  actual  code  in  a form  body,  I.©.,  the 
implementation  part,  is  roughly  the  same  size  as  the  corresponding  code  in  other  languages 
(although  the  first  statement  does  seem  to  shorten  many  of  the  examples).  Moreover,  for 
some  reason,  the  information  needed  for  verification  (abstract  and  concrete  invariants. 
abstract  pre  and  post  conditions,  rep  function,  etc  ) usually  seems  about  equal  to  the  code 
size;  thus  a full  form  is  about  twice  the  size  of  the  code  alone.  This  does  not  particularly 
concern  us,  since  these  Kinds  of  specifications  tend  to  replace  much  of  the  documentation  that 
would  otherwise  be  needed  --  and  they  are  certainly  more  precise 

We  find  the  verification  of  a form,  once  the  specifications  and  code  have  been  written, 
to  be  more  difficult  and  time-consuming  than  coding,  but  not  unreasonably  so  (say  by  as  much 
as  a factor  of  two  or  three).  Sometimes  it  is  necessary  to  modify  the  specifications,  or  the 
code,  during  the  verification  in  order  to  remove  inconsistencies  that  are  uncovered.  The 
verification  may  also  suggest  different  specifications,  usually  ones  that  are  more  constrained 
but  sometimes  simpler  ones.  In  spite  of  the  difficulties,  the  bodies  of  functions  tend  to  be 
small  and  their  proofs  correspondingly  small,  as  can  be  seen  from  these  examples.  Moreover, 
the  proofs  of  the  two  forms  symtab  and  condis  were  independent.  To  date  our  proofs  have 
been  manually  generated,  but  we  envision  having  automated,  interactive  aids  in  the  future. 
These  should  reduce  the  verification  time  to  approximately  the  coding  time.  Since  this  is  less 
than  the  time  currently  spent  on  debugging,  we  feel  highly  encouraged 

The  majority  of  our  time  goes  into  designing  and  specifying  the  abstraction.  There  are 
two  related  aspects  of  this:  getting  the  intuitive  abstraction  "right",  and  formalizing  it  (at  least 
sufficiently  for  it  to  be  verified).  The  two  appear  related  in  that  difficulty  in  formalizing  an 
intuitive  abstraction  often  seems  to  uncover  muddy  thinking  at  the  intuitive  level.  While  we 
seem  to  be  improving  our  ability  to  formalize,  indicating  that  it  is  a learnable  skill,  we  have  no 
easy  rules  for  picking  the  right  abstraction  in  the  first  place.  While,  with  practice,  our  abilities 
in  choosing  abstractions  may  also  improve,  we  suspect  that  this  is  a fundamental  problem  of 
design  and  has  a significant  aesthetic  component. 

It  is  clear  that  we  are  just  learning  to  use  the  power  of  the  tools  we  are  creating  and 
exploring.  Much  remains  to  be  discovered  about  what  is  possible  or  impossible,  easy  or  hard, 
and  reasonable  or  unreasonable  to  do  with  the  facilities.  In  this  connection  we  note  that  an 
early  version  of  symtab  was  a one-level  form,  used  no  generator  such  as  indis,  and  had  only 
some  of  the  same  verification  information.  Although  that  version  of  symtab  used  the  same 
implementation  ideas,  it  was  essentially  incomprehensible.  When  we  realized  that  multiple 
ideas  were  becoming  confused,  we  separated  the  maintenance  of  the  lists  from  the  lookup 
algorithms.  The  result  was  that  the  code,  the  specifications,  and  the  verification  all  became 
much  more  manageable. 
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Appendix  A 

Informal  Description  of  Verification  Methodology 


Alphard’s  verification  methodology  is  designed  to  determine  whether  a form  will  actually 
behave  as  promised  by  its  abstract  specifications  The  methodology  depends  on  explicitly 
separating  the  description  of  how  an  object  behaves  from  the  code  that  manipulates  the 
representation  in  order  to  achieve  that  behavior.  It  is  derived  from  Hoare’s  technique  for 
showing  correctness  of  data  representations[Hoare72). 

The  abstract  object  and  its  behavior  are  described  in  terms  of  some  mathematical 
entities  natural  to  the  problem  domain.  Graphs  are  used  in  [Shaw76a]  to  describe  binary 
trees;  sequences  are  used  in  [Wulf76a,b]  to  describe  queues  and  stacks  and  in  condis  to 
describe  list  processing,  and  so  on.  We  appeal  to  these  abstract  types 

- in  the  invariant,  which  explains  that  an  instantiation  of  the  form  may  be  viewed 

as  an  object  of  the  abstract  type  that  meets  certain  restrictions, 

- in  the  initially  clause,  where  a particular  abstract  object  is  displayed,  and 

- in  the  pre  and  post  conditions  for  each  function,  which  describe  the  effect  the 

function  has  on  an  abstract  object  which  satisfies  the  invariant. 


The  form  contains  a parallel  set  of  descriptions  of  the  concrete  object  and  how  it 
behaves.  In  many  cases  this  makes  the  effect  of  a function  much  easier  to  specify  and  verify 
than  would  the  abstract  description  alone. 

Now,  although  it  is  useful  to  distinguish  between  the  behavior  we  want  and  the  data 
structures  we  operate  on,  we  also  need  to  show  a relationship  that  holds  between  the  two. 
This  i6  achieved  with  the  representation  function  rep(x).  which  gives  a mapping  from  the 
concrete  representation  to  the  abstract  description.  The  purpose  of  a form  verification  is  to 
ensure  that  the  two  invariants  and  the  rep(x)  relation  between  them  are  preserved. 

In  order  to  verify  a form  we  must  therefore  prove  four  things.  Two  relate  to  the 
representation  itself  and  two  must  be  shown  for  each  function.  Informally,  the  four  required 
steps  are®: 

6 We  will  use  !#(rep(x»  to  denote  the  abstract  invariant  of  an  object  whose  concrete 
representation  is  x,  Ic(x)  to  denote  the  corresponding  concrete  invariant,  italics  to  refer  to 
code  segments,  and  the  names  of  specification  clauses  and  assertions  to  refer  to  those 
formulas.  In  step  4b,  "pre(rep(x’»"  refers  to  the  value  of  x b«fon  execution  of  the  function. 
A complete  development  of  the  form  verification  methodology  appears  in  [Wulf76a,b). 
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F or  the  form 

1.  Representation  validity 

lcM  3 Ia(rep(x» 

2.  Initialization 

requires  { wut  clout*  J initially(rep(x))  a lc(x) 

F or  each  function 

3.  Concrete  operation 

in(x)  a Ic(x)  { function  body  } out(x)  a Ic(x) 

4.  Relation  between  abstract  and  concrete 

4a.  Ic(x)  a pre(rep(x»  a in(x) 

4b.  Ic(x)  a pre(rep(x’))  A out(x)  o post(rep(x)) 

Step  1 shows  that  any  legal  state  of  the  concrete  representation  has  a corresponding  abstract 
object  (the  converse  is  deducible  from  the  other  steps).  Step  2 shows  that  the  initial  state 
created  by  the  representation  section  is  legal.  Step  3 is  the  standard  verification  formula  for 
the  concrete  operation  as  a simple  program;  note  that  it  enforces  the  preservation  of  Ic.  Step 
4 guarantees  (a)  that  the  concrete  operation  is  applicable  whenever  the  abstract  pre  condition 
holds  and  (b)  that  if  the  operation  is  performed,  the  result  corresponds  properly  to  the 
abstract  specifications. 
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